An Efficient Recovery Mechanism with Checkpointing Approach for Cluster Federation

نویسنده

  • Manoj Kumar
چکیده

Checkpoint and recovery protocols are commonly used in distributed applications for providing fault tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved. Checkpointing is one of the fault-tolerant techniques to restore faults and to restart job fast. The algorithms for checkpointing on distributed systems have been under study for years. It is known that checkpointing and rollback recovery are widely used techniques that allow a distributed computing to progress inspite of a failure.There are two fundamental approaches for checkpointing and recovery.One is asynchronus approach, process take their checkpoints independenty.So,taking checkpoints is very simple but due to absence of a recent consistent global checkpoint which may cause a rollback of computation.Synchronus checkpointing approach assumes that a single process other than the application process invokes the checkpointing algorithm periodically to determine a consistent global checkpoint.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Roll-Forward Checkpointing / Recovery Mechanism for Cluster Federation

In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and proposed an efficient checkpointing / recovery mechanism for it. The main objective of the proposed approach is to advance the recovery line in a cluster federation such that we can put a limit on the amount of rollback by the processes in all the clusters in case of failure(s) in the ...

متن کامل

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

An Application-Transparent, Platform-Independent Approach to Rollback-Recovery for Mobile Agent Systems

This paper proposes a new approach to rollback-recovery for mobile-agent systems, and describes its implementation in the MESSENGERS mobile agents system. The used checkpointing method allows to implement space and time efficient, user-transparent rollback-recovery in heterogeneous distributed environments. Together with an efficient non-blocking system snapshot algorithm this checkpointing met...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

Domino-Effect Free Crash Recovery for Concurrent Failures in Cluster Federation

In this paper, we have addressed the complex problem of recovery for concurrent failures in cluster computing environment. We have proposed a new approach in which we have dealt with both inter cluster orphan and lost messages unlike the existing works. The proposed recovery approach is free from the domino-effect and hence guarantees the least amount of recomputation after recovery. Besides, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014